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Abstract. Recent techniques for inferring business relationships be- 
tween ASs [1, 2] have yielded maps that have extremely few invalid BGP 
paths in the terminology of Gao [3]. However, some relationships in- 
ferred by these newer algorithms are incorrect, leading to the deduction 
of unrealistic AS hierarchies. We investigate this problem and discover 
what causes it. Having obtained such insight, we generalize the problem 
of AS relationship inference as a multiobjective optimization problem 
with node-degree-based corrections to the original objective function of 
minimizing the number of invalid paths. We solve the generalized ver- 
sion of the problem using the semidefinite programming relaxation of 
the MAX2SAT problem. Keeping the number of invalid paths small, we 
obtain a more veracious solution than that yielded by recent heuristics. 

1 Introduction 

As packets flow in the Internet, money also flows, not necessarily in the same di- 
rection. Business relationships between ASs reflect both flows, indicating a direc- 
tion of money transfer as well as a set of constraints to the flow of traffic. Knowing 
AS business relationships is therefore of critical importance to providers, ven- 
dors, researchers, and policy makers, since such knowledge sheds more light on 
the relative "importance" of ASs. 

The problem is also of multidimensional interest to the research community. 
Indeed, the Internet AS-level topology and its evolutionary dynamics result from 
business decisions among Internet players. Knowledge of AS relationships in the 
Internet provides a valuable validation framework for economy-based Internet 
topology evolution modeling, which in turn promotes deeper understanding of 
the fundamental laws driving the evolution of the Internet topology and its 
hierarchy. 

Unfortunately, the work on inferring AS relationships from BGP data has 
recently encountered difficulties. We briefly describe this situation in its historical 
context. 



Gao introduces the AS relationship inference problem in her pioneering pa- 
per [3]. This work approximates reality by assuming that any AS-link is of 
one of the following three types: customer-provider, peering, or sibling. If all 
ASs strictly adhere to import and export policies described in [3], then every 
BGP path must comply with the following hierarchical pattern: an uphill seg- 
ment of zero or more customer-to-provider or sibling-to-sibling links, followed by 
zero or one peer-to-peer links, followed by a downhill segment of zero or more 
provider-to-customer or sibling-to-sibling links. Paths with the described hierar- 
chical structure are deemed valid. After introducing insight about valid paths, 
Gao proposes an inference heuristic that identifies top providers and peering 
links based on AS degrees and valid paths. 

In [4], Subramanian et al. (SARK) slightly relax the problem by not infer- 
ring sibling links, and introduce a more consistent and elegant mathematical 
formulation. The authors render the problem into a combinatorial optimization 
problem: given an undirected graph G derived from a set of BGP paths P, assign 
the edge type (customer-provider or peering) to every edge in G such that the 
total number of valid paths in P is maximized. The authors call the problem 
the type-of-relationship (ToR) problem, conjecture that it is NP-complete, and 
provide a simple heuristic approximation. 

Di Battista et al. (DPP) in [1] and independently Erlebach et al. (EHS) in [2] 
prove that the ToR problem is indeed NP-completc. EHS prove also that it is even 
harder, specifically APX-complete. 3 More importantly for practical purposes, 
both DPP and EHS make the straightforward observation that peering edges 
cannot be inferred in the ToR problem formulation. Indeed, as the validation 
data presented by Xia et al. in [5] indicates, only 24.63% of the validated SARK 
peering links are correct. 

Even more problematic is the following dilemma. DPP (and EHS) come 
up with heuristics that outperform the SARK algorithm in terms of produc- 
ing smaller numbers of invalid paths [1,2]. Although these results seem to be a 
positive sign, closer examination of the AS relationships produced by the DPP 
algorithm [6] reveals that the DPP inferences are further from reality than the 
SARK inferences. In the next section we show that improved solutions to the ToR 
problem do not yield practically correct answers and contain obviously misiden- 
tificd edges, e.g. well-known global providers appear as customers of small ASs. 
As a consequence, we claim that improved solutions to the unmodified ToR 
problem do not produce realistic results. 

An alternative approach to AS relationship inference is to disregard BGP 
paths and switch attention to other data sources (e.g. WHOIS) [7,8], but noth- 
ing suggests that we have exhausted all possibilities of extracting relevant infor- 
mation from BGP data. Indeed, in this study we seek to answer the following 
question: can we adjust the original (ToR) problem formulation, so that an algo- 



3 There exists no polynomial-time algorithm approximating an APX-complete prob- 
lem above a certain inapproximability limit (ratio) dependent on the particular prob- 
lem. 



rithmic solution to the modified problem would yield a better answer from the 
practical perspective? 

The main contribution of this paper is that we positively answer this question. 
We describe our approach and preliminary results in the subsequent two sections, 
and conclude by describing future directions of this work. 

2 Methodology 

2.1 Inspiration behind our approach 

The main idea behind our approach is to formalize our knowledge regarding why 
improved solutions to the ToR problem fail to yield practically right answers. 
To this end we reformulate the ToR problem as a multiobjective optimization 
problem introducing certain corrections to the original objective function. We 
seek a modification of the original objective function, such that the minimum of 
the new objective function reflects an AS relationship mapping that is closer to 
reality. 

2.2 Mapping to 2SAT 

To achieve this purpose, we start with the DPP and EHS results [1, 2] that deliver 
the fewest invalid paths. Suppose we have a set of BGP paths P from which we 
can extract the undirected AS-level graph G(V, E). We introduce direction to 
every edge in E from the customer AS to the provider AS. Directing edges in E 
induces direction of edges in P. A path in P is valid if it does not contain the 
following invalid pattern: a provider-to-customer edge followed by a customer-to- 
provider edge. The ToR problem is to assign direction to edges in E minimizing 
the number of paths in P containing the invalid pattern. 

The problem of identifying the directions of all edges in E making all paths 
in P valid — assuming such edge orientation exists — can be reduced to the 2SAT 
problem. 4 Initially, we arbitrarily direct all edges in E and introduce a boolean 
variable Xi for every edge i,i — 1 . . . | i? | . If the algorithms described below assign 
the value true to a;,, then edge i keeps its original direction, while assignment of 
false to Xi reverses the direction of i. We then split each path in P into pairs of 
adjacent edges involving triplets of ASs (all 1-link paths are always valid) and 
perform mapping between the obtained pairs and 2-variable clauses as shown 
in Table 1. The mapping is such that only clauses corresponding to the invalid 
path pattern yield the false value when both variables are true. If there exists 
an assignment of values to all the variables such that all clauses are satisfied, 
then this assignment makes all paths valid. 

4 2SAT is a variation of the satisfiability problem: given a set of clauses with two 
boolean variables per clause h\/ lj, find an assignment of values to variables satisfy- 
ing all the clauses. MAX2SAT is a related problem: find the assignment maximizing 
the number of simultaneously satisfied clauses. 



Table 1. Mapping between pairs of adjacent edges in P, 2SAT clauses, and edges in Gisat- The 
invalid path pattern is in the last row. 
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To solve the 2SAT problem, we construct a dual graph, the 2SAT graph 
G2SAt(V2SAT, E2SAT), according to the rules shown in Table 1: every edge i £ E 
in the original graph G gives birth to two vertices Xi and Xi in V2SAT, and every 
pair of adjacent links U V lj in P, where literal U is either xi (xj) or Xi (%), 
gives birth to two directed edges in Eisat'- from vertex Zj to vertex and from 
vertex to vertex li. As shown in [9], there exists an assignment satisfying all 
the clauses if there is no edge i such that both of its corresponding vertices in 
the 2SAT graph, Xi,Xi 6 V2SAT1 belong to the same strongly connected compo- 
nent 5 (SCC) in G 2S at- 

If an assignment satisfying all the clauses exists we can easily find it. We 
perform topological sorting 6 t on nodes in V2SAT and assign true or false to a 
variable Xi depending on if t{xi) < t(xi) or t(xi) < t(xi) respectively. All opera- 
tions described so far can be done in linear time. 

2.3 MAX2SAT: DPP vs. EHS 

As soon as a set of BGP paths P is "rich enough," there is no assignment 
satisfying all clauses and making all paths valid. Furthermore, the ToR problem 
of maximizing the number of valid paths can be reduced to the MAX2SAT [1, 2] 
problem of maximizing the number of satisfied clauses. Making this observation, 
DPP propose a heuristic to find the maximal subset of paths Ps C P such that 
all paths in Ps are valid. 

EHS use a different approach. They first direct the edges i £ E that can be di- 
rected without causing conflicts. Such edges correspond to vertices Xi,Xi £ V 2 sat 

5 An SCC is a set of nodes in a directed graph s. t. there exists a directed path between 
every ordered pair of nodes. 

6 Given a directed graph G(V,E), function t : V t-> R is topological sorting if 
t(i) < t(j) for every ordered pair of nodes i, j € V s. t. there exists a directed path 
from i to j. 



that have indegree or outdegree zero. Then EHS iteratively remove edges directed 
as described above and strip P, G, and G2SAT accordingly. This procedure signif- 
icantly shortens the average path length in P, which improves the approximation 
of ToR by MAX2SAT. Finally, they approximate MAX2SAT to find a solution 
to the ToR problem. 

2.4 Solving MAX2SAT with SDP 

The MAX2SAT problem is NP- and APX-complete [10], but Gocmans and 
Williamson (GW) [11] construct a famous approximation algorithm that uses 
semidefinite programming (SDP) and delivers an approximation ratio of 0.878. 
The best approximation ratio currently known is 0.940, due to improvements 
to GW by Lewin, Livnat, and Zwick (LLZ) in [12]. Note that this approximation 
ratio is pretty close to the MAX2SAT inapproximability limit of |j ~ 0.954 [13]. 

To cast a MAX2SAT problem with m 2 clauses involving mi literals (vari- 
ables Xi and their negations Xi, i = 1 . . . mi) to a semidefinite program, we first 
get rid of negated variables by introducing mi variables x mi+i — x^. Then we 
establish mapping between boolean variables Xk, k = 1 . . . 2mi, and 2mi + 1 aux- 
iliary variables y ,yk & {-1, 1}, y mi +i = ~Vi, using formula x k = (1 + 2/oJ/fc)/ 2 - 
This mapping guarantees that Xk = true <=> yk = yo and Xk = false y k = —yo- 
Given the described construction, we call yo the truth variable. After trivial al- 
gebra, the MAX2SAT problem becomes the maximization problem for the sum 
V 4 Sfc™=i w k i{3 + yoyk + yoVl - VkVl), where weights w k i are either 1 if clause 
Xk V xi is present in the original MAX2SAT instance or otherwise. Hereafter 
we fix the notations for indices i,j = l...m\ and k, I = 1 . . . 2mi. 

The final transformation to make the problem solvable by SDP is relaxation. 
Relaxation involves mapping variables yo, yk to 2mi+l unit vectors Vo, v k G M mi + 
fixed at the same origin — all vector ends lie on the unit sphere S mi . The problem 
is to maximize the sum composed of vector scalar products: 



Interestingly, this problem, solvable by SDP, is equivalent to the following 
minimum energy problem in physics. Vectors vo, Vk point to the locations of par- 
ticles po,p k freely moving on the sphere S mi except that particles pi and p mi +i 
are constrained to lie opposite on the sphere. For every MAX2SAT clause Xk Vs;, 
we introduce three constant forces of equal strength (see Fig. 1): one repulsive 
force between particles pk and pi, and two attractive forces: between pk and po, 
and between pi and po — the truth particle po attracts all other particles pk with 
the forces proportional to the number of clauses containing x k . The goal is to 
find the location of particles on the sphere minimizing the potential energy of 
the system. If we built such a mission-specific computer in the lab, it would solve 
this problem in constant time. SDP solves it in polynomial time. 




(1) 



k,i=i 

s.t. v ■ v = v k ■ v k = 1, Vi ■ v mi+i = -1 
k = 1 . . . 2mi, i — 1 . . . mi. 




Fig. 1. The semidefinite programming relaxation to the MAX2SAT problem. Point pq (correspond- 
ing to vector vq from the text) is the truth point. It attracts both points p^. and pi representing 
the boolean variables from the clause Xk V xj. Points pk and pi repel each other. The problem is to 
identify the locations of all points on the sphere that minimize the potential energy of the system. 
Given an orientation by SDP, we cut the system by a random hypcrplanc and assign value true to 
the variables corresponding to points lying on the same side of the hyperplane as the truth po- 

To extract the solution for the MAX2SAT problem from the solution obtained 
by SDP for the relaxed problem, we perform rounding. Rounding involves cutting 
the sphere by a randomly oriented hyperplane containing the sphere center. We 
assign value true (false) to variables Xk corresponding to vectors Vk lying on the 
same (opposite) side of the hyperplane as the truth vector vq . GW prove that the 
solution to the MAX2SAT problem obtained this way delivers the approximation 
ratio of 0.878 [11]. We can also rotate the vector output obtained by SDP before 
rounding and skew the distribution of the hyperplane orientation to slightly 
prefer the orientation perpendicular to vq. These two techniques explored to 
their greatest depths by LLZ improve the approximation ratio up to 0.9401 [12]. 

2.5 Analysis of the unperturbed solution 

We now have the solution to the original ToR problem and are ready to an- 
alyze it. While the number of invalid paths is small [2], the solution is not 
perfect — some inferred AS relationships are not in fact accurate. What causes 
these misclassifications? 

First, some edges may be directed either way resulting in exactly the same 
number of invalid paths — such edges are directed randomly. To exemplify con- 
sider path p £ P ', p = {«i«2 • • • *| P |-ii} ; Hi *2i • • • j i\ P \-i,j £ E, and suppose that 
the last edge j appears only in one path (that is, p) and that it is from some 
large provider (like UUNET) to a small customer. Suppose that other edges 
ii, i2, ■ ■ ■ , i\p\-i appear in several other paths and that they are correctly inferred 
as customer-to-provider. In this scenario both orientations of edge j (i.e. correct 
and incorrect: provider-to-customer and customer-to-providcr) render path p 
valid. Thus, edge j is directed randomly, increasing the likelihood of an incor- 
rect inference. We can find many incorrect inferences of this type in our experi- 
ments in the next section and in [6], e.g. well-known large providers like UUNET, 
AT&T, Sprintlink, Level3, are inferred as customers of smaller ASs like AS1 (AS 
degree 67), AS2685 (2), AS8043 (1), AS13649 (7), respectively. 



Second, not all edges are customer-to-provider or provider-to-customer. In 
particular, trying to direct sibling edges leads to proliferation of error. Indeed, 
when the only objective is to maximize the number of valid paths, directing 
a sibling edge brings the risk of misdirecting the dependent edges sharing a 
clause with the sibling edge. To clarify, consider path p e P, p = {ij}, i,j 6 E, 
and suppose that in reality i is a sibling edge that appears in multiple paths 
and that j is a customer-to-provider edge that appears only in one path p. 
The algorithm can classify edge i either as customer-to-providcr or provider- 
to-customer depending on the structure of the paths in which it appears. If 
this structure results in directing i as provider-to-customer, then the algorithm 
erroneously directs edge j also as provider-to-customer to make path p valid. In 
other words, the outcome is that we maximize the number of valid paths at the 
cost of inferring edge j incorrectly. 

We can conclude that the maximum number of valid paths does not corre- 
spond to a correct answer because, as illustrated in the above two examples, it 
can result in miss-inferred links. Specifically, in the presence of multiple solu- 
tions there is nothing in the objective function to require the algorithm to prefer 
the proper orientation for edge j. Our next key question is: Can we adjust the 
objective function to infer the edge direction correctly? 

2.6 Our new generalized objective function 

A rigorous way to pursue the above question is to add to the objective function 
some small modifier selecting the correct edge direction for links unresolved by 
the unperturbed objective function. Ideally this modifier should be a function 
of "AS importance," such as the relative size of the customer tree of an AS. 
Unfortunately, defined this way the modifier is a function of the end result, edge 
orientation, which makes the problem intractable (i.e. we cannot solve it until 
we solve it). 

The simplest correcting function that does not depend on the edge direction 
and is still related to perceived "AS importance," is the AS degree "gradient" in 
the original undirected graph G the difference between node degrees of adjacent 
ASs. In the examples from the previous subsection, the algorithm that is trying 
not only to minimize the number of invalid paths but also to direct edges from 
adjacent nodes of lower degrees to nodes of higher degrees will effectively have 
an incentive to correctly infer the last edge j £ p. 

More formally, we modify the objective function as follows. In the original 
problem formulation, weights Wki for 2-link clauses Xk V xi (pairs of adjacent 
links in P) are either or 1. We first alter them to be either 0, if pair {kl} £ P, 
or Wki(a) = c 2 a otherwise. The normalization coefficient c 2 is determined from 
the condition Y^kjtl w ki( a ) = a =** c 2 = l/ m 2 (recall that mi is the number of 
2-link clauses), and a is an external parameter, < a < 1, whose meaning we 
explain below. 

In addition, for every edge i £ E, we introduce a 1-link clause weighted by a 
function of the node degree gradient. More specifically, we initially orient every 
edge i E E along the node degree gradient: if d~ and df , d~ < d+ , are degrees of 



nodes adjacent to edge i, we direct i from the d~-degree node to the d+-degree 
node, for use as input to our algorithm. 7 

Then, we add 1-link clauses Xj V Xj, Vi G E, to our MAX2SAT instance, and 
we weight them by wu(a) = ci(l — a)f{dj,df). The normalization coefficient c\ 
is determined from the condition Wu{a) = 1 — a, and the function / should 
satisfy the following two conditions: 1) it should "roughly depend" on the rela- 
tive node degree gradient (df — d^)/df ; and 2) it should provide higher values 
for node pairs with the same relative degree gradient but higher absolute de- 
gree values. The first condition is transparent: we expect that an AS with node 
degree 5, for example, is more likely a customer of an AS with node degree 10 
than a 995-degree AS is a customer of a 1000-degree AS. The second condition 
is due to the fact that we do not know the true AS degrees: we approximate 
them by degrees of nodes in our BGP-derived graph G. The graphs derived from 
BGP data have a tendency to underestimate the node degree of small ASs, while 
they yield more accurate degrees for larger ASs [14]. Because of the larger error 
associated with small ASs, an AS with node degree 5, for example, is less likely 
a customer of an AS with node degree 10 than a 500-degree AS is a customer of 
a 1000-degree AS. 

We select the following function satisfying the two criteria described above: 



In summary, our new objective function looks exactly as the one in (1), but 
with different weights on clauses: 



Now we can explain the role of the parameter a. Since J2k^i Wkl ( a ) = a 
and ^2 k= i Wki (a) = 1 — a, parameter a measures the relative importance of sums 
of all 2- and 1-link clauses. If a — 1, then the problem is equivalent to the original 
unperturbed ToR problem — only the number of invalid paths matters. If a = 0, 
then, similar to Gao, only node degrees matter. Note that in the terminology of 
multiobjective optimization, we consider the simplest scalar method of weighted 
sums. 

In our analogy with physics in Fig. 1, we have weakened the repulsive forces 
among particles other than the truth particle po, and we have strengthened the 
forces between po and other particles. When a = 0, there are no repulsive forces, 
the truth particle po attracts all other particles to itself, and all the vectors 
become collinear with vq. Cut by any hyperplane, they all lie on the same side 
as Vq, which means that all variables x, are assigned value true and all links i 
remain directed along the node degree gradient in the output of our algorithm. 

7 An initial direction along the node degree gradient does not affect the solution since 
any initial direction is possible. We select the node degree gradient direction to 
simplify stripping of non-conflict edges in the next section. 
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Fig. 2. Percentage of valid paths, of edges directed as in the a — case and of edges directed as 
in the a — 1 case for different values of a. 

3 Results 



In our experiments, the BGP path set P is a union of BGP tables from Route- 
Views [15] and 18 BGP route servers from [16] collected on May 13, 2004. Paths 
of length 1 are removed since they are always valid. The total number of paths 
is 1,025,775 containing 17,557 ASs, 37,021 links, and 382,917 unique pairs of 
adjacent links. 

We first pre-process the data by discovering sibling links. For this purpose, 
we use a union of WHOIS databases from APJN, RIPE, APNIC, and LACNIC 
collected on June 10, 2004. We say that two ASs belong to the same organization 
if, in the WHOIS database, they have exactly the same organization names, or 
names different only in the last digits, e.g. "ATT-37" and "ATT-38,"or very 
similar names, e.g. "UUNET South Africa" and "UUNET Germany." We infer 
links in P between adjacent ASs belonging to the same organization as sibling. 
We find 211 sibling links in our dataset, which we ignore in subsequent steps. 
More precisely, we do not assign boolean variables to them. 

We then direct the remaining links in the original graph G along the node 
degree gradient, assign boolean variables to them, and construct the dual G2SAT 
graph. After directing edge i along the node degree gradient, we check whether 
this direction satisfies all clauses containing k (xi or x{). If so, we then remove 
the edge and strip P, G, and G2SAT accordingly. In this case we say that edge i 
causes no conflicts because the value of the corresponding literal k satisfies all 
the clauses in which k appears, independent of the values of all other literals 
sharing the clauses with Zj. A non-conflict edge has two corresponding vertices 
in the G2SAT graph, Xi and x~i. It follows from the construction of the G2SAT 
graph that Xi has an outdegree of zero and Xi has an indegree of zero. We repeat 
the described procedure until we cannot remove any more edges. The stripped 
graph G has 1,590 vertices (9% of the original |V|) and 4,249 edges (11% of the 
original \E\). The stripped G2SAT graph has 8,498 vertices and 46,920 edges. 
In summary, we have 4,249 (mi) 1-link clauses and 23,460 (TO2) 2-link clauses. 
We feed this data into a publicly available SDP solver DSDP v4.7 [17], reusing 
parts of the code from [2] and utilizing the LEDA v4.5 software library [18]. We 



Table 2. Hierarchical ranking of ASs. The position depth (the number of AS at the levels above) 
and width (the number of ASs at the same level) of the top five ASs in the a — and a — 1 cases 
arc shown for different values of a. The customer leaf ASs arc marked with asterisks. 





a = 0.0 


a = 0.2 


a = 0.5 


a = 0.8 


a = 1.0 


AS # 


name 


degree 


dep. wid. 


dep. wid. 


dep. wid. 


dep. wid. 


dep. wid. 


701 


UUNET 


2373 


1 


173 


1 232 


1 252 


17 476 


1239 


Sprint 


1787 


1 1 


173 


1 232 


1 252 


17 476 


7018 


AT&T 


1723 


2 1 


173 


1 232 


1 252 


17 476 


3356 


Level 3 


1085 


3 1 


173 


1 232 


1 252 


17 476 


209 


Qwest 


1072 


4 1 


173 


1 232 


1 252 


17 476 


3643 


Sprint Austr. 


17 


194 1 


222 1 


250 1 


268 1 


4 


6721 


Nextra Czech Net 


3 


1742 941 


833 88 


868 90 


884 89 


4 


11551 


Pressroom Ser. 


2 


1742 941 


1419 398 


1445 390 


1457 386 


4 


1243 


Army Systems 


2 


2683 14725* 


2753 14655* 


1445 390 


1457 386 


4 


6712 


France Transpac 


2 


2683 14725* 


2753 14655* 


292 3 


1 252 


4 13 



incorporate the pre-rounding rotation and skewed distribution of hyperplane 
orientation by LLZ [12]. 



Fig. 2 shows results of edge orientations we derive for different values of a 
in (3) . Specifically, the figure shows the percentage of valid paths, edges directed 
as in the a = case, and edges directed as in the a — 1 case. In the particular 
extreme case of a = 1, the problem reduces to the original ToR problem consid- 
ered by DPP and EHS, and its solution yields the highest percentage of valid 
paths, 99.67%. By decreasing a, we increase preference to directing edges along 
the node degree gradient, and at the other extreme of a = 0, all edges become 
directed along the node gradient, but the number of valid paths is 92.95%. 

Note that changing a from to 0.1 redirects 1.64% of edges, which leads 
to a significant 6.53% increase in the number of valid paths. We also observe 
that the tweak of a from 1 to 0.9 redirects 2.56% of edges without causing any 
significant decrease (only 0.008%) in the number of valid paths. We find that 
most of these edges are directed randomly in the a = 1 case because oriented 
either way they yield the same number of valid paths. In other words, the AS 
relationships represented by these edges cannot be inferred by minimizing the 
number of invalid paths. 

We also rank ASs by means of our inference results with different a values. 
To this end we split all ASs into hierarchical levels as follows. We first order all 
ASs by their reachability — that is, the number of ASs that a given AS can reach 
"for free" traversing only provider-to-customer edges. We then group ASs with 
the same reachability into levels. ASs at the highest level can reach all other ASs 
"for free." ASs at the lowest level have the smallest reachability (fewest "free" 
destinations). Then we define the position depth of AS X as the number of ASs 
at the levels above the level of AS X. The position width of AS X is the number 
of ASs at the same level as AS X. 

Table 2 shows the results of our AS ranking. For different values of a, we track 
the positions of the top five ASs in the a = and a — 1 cases. In the former case, 
well-known large ISPs are at the top, but the number of invalid paths is relatively 
large, cf. Fig. 2. In the latter case delivering the solution to the unperturbed 



ToR problem, ASs with small degrees occupy the top positions in the hierarchy. 
These ASs appear in much lower positions when a^l. Counter to reality, the 
large ISPs are not even near the top of the hierarchy. We observe that the 
depth 8 of these large ASs increases as a approaches 1, indicating an increasingly 
stronger deviation from reality. The deviation is maximized when a = 1. This 
observation pronounces the limitation of the ToR problem formulation based 
solely on maximization of the number of valid paths. 

4 Conclusion and future work 

Using a standard multiobjective optimization method, we have constructed a 
natural generalization of the known AS relationship inference heuristics. We 
have extended the combinatorial optimization approach based on minimization 
of invalid paths, by incorporating AS-degree-based information into the problem 
formulation. Utilizing this technique, we have obtained first results that are more 
realistic than the inferences produced by the recent state-of-the-art heuristics [1, 
2]. We conclude that our approach opens a promising path toward increasingly 
veracious inferences of business relationships between ASs. 

The list of open issues that we plan to address in our future work includes: 
1) modifications to the algorithm to infer peering; 2) careful analysis of the trade- 
off surface [19] of the problem, required for selecting the value of the external 
parameters (e.g. a) corresponding to the right answer; 3) detailed examination 
of the structure of the AS graph directed by inferred AS relationships; 4) vali- 
dation considered as a set of constraints narrowing the range of feasible values 
of external parameters; and 5) investigation of other AS-ranking mechanisms 
responsible for the structure of the inferred AS hierarchy. 
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